REPORT  DOCUMENTATION  PAGE 


1.  AGENCY  USE  ONLY 


4.  TITLE  AND  SUBTITLE 


12.  REPORT  DATE 


April  1997 


Perception  of  Complex  Auditory  Patterns 


6.  AUTHOR(S) 

Charles  S.  Watson,  Principle  Investigator 
Gary  R.  Kidd,  Investigator 


3.  REPORT  TYPE  AND  DATES  COVERED 

Final  Technical,  15  Sep  92  - 15  Oct  96 


5,  FUNDING  NUMBERS 

F49620-92-J-0506  C 


Q)  I 


1.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Department  of  Speech  and  Hearing  Sciences  and  Department  of  Psychology 
Indiana  University 
Bloomington,  IN  47405 


9.  SP0NS0RINGrt40NIT0RING  AGENCY  NAMES{S)  AND  ADDRESS(ES) 

Air  Force  Office  of  Scientific  Research 
110  Duncan  Avenue,  Suite  B 1 15 
Bolling  Green  Air  Force  Base 
Washington,  DC  20332 


11.  SUPPLEMENTARY  NOTES 


12a.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  is  unlimited. 


V/ 


nr/lQUIWR  ORGANIZATION 


AFOSR-TR-97 


10.  SPONSORING/MONITORING 
AGENCY  REPORT  NUMBER 


19910602  123 


112b.  DISTRIBUTION  CODE 


13.  ABSTRACT  (Maximum  200  words) 

This  report  summarizes  work  accomplished  under  the  support  of  AFOSR  Grant  No.  F49620-92-J-0506  between 
September  15, 1992  and  October  15, 1996.  Two  independent  laboratories  at  Indiana  University  contributed  to  this  work, 
the  Hearing  and  Communication  Laboratory  in  the  Department  of  Speech  and  Hearing  Sciences,  directed  by  C.  S. 
Watson,  and  the  Auditory  Research  Laboratory  in  the  Department  of  Psychology,  directed  by  D.  E.  Robinson.  The 
research  includes  experiments  on  the  discrimination  and  identification  of  a  variety  of  complex  sounds  including 
sequences  of  tones,  spectrally  shaped  waveforms,  gaussian  noise  samples,  speech  sounds,  and  familiar  environmental 
sounds.  This  work  has  identified  and  examined  several  factors  that  influence  the  ability  to  discriminate  and  identify 
these  kinds  of  sounds.  These  factors  include  stimulus  uncertainty,  the  proportion  of  the  total  duration  of  a  sound  that  is 
subject  to  change,  the  temporal  location  of  a  change  within  a  sound,  and  the  details  of  a  sound’s  spectral-temporal 
structure.  Individual  differences  in  auditory  abilities  and  correlations  among  these  abilities  have  also  been  examined, 
with  several  studies  demonstrating  very  low  correlations  between  listeners’  abilities  to  hear  the  details  of  speech  and 
nonspeech  sounds. 


14.  SUBJECT  TERMS 

complex  sounds,  auditory  patterns,  individual  differences,  auditory  perception,  hearing 


15.  NUMBER  OF  PAGES 
10 


16.  PRICE  CODE 


17.  SECURITY  CLASSIFICATION 
OF  REPORT  Unclassified 


1 8.  SECURITY  CUSSIFICATION 
OF  THIS  PAGE  Unclassified 


19.  SECURITY  CLASSIFICATION 
OF  ABSTRACT  Unclassified 


20.  LIMITATION  OF  ABSTRACT 

UL 


Work  Accomplished 
A.  Tonal  Sequences 

1.  The  proportion-of-the-total-duration  (PTD)  rule.  Kidd,  Watson 

We  have  extended  our  examination  of  the  role  of  proportional  duration  in  auditory  pattern 
discrimination  to  assess  the  generality  of  the  PTD  rule.  Earlier  work  (Kidd  &  Watson,  1992)  has 
shown  that  each  individual  component  of  an  unfamiliar  sequence  of  tones  is  resolved  with  an 
accuracy  that  increases  monotonically  with  the  component’s  proportion  of  the  total  duration  of  the 
sequence  (the  PTD  rule).  This  work  was  extended  to  the  case  of  duration  discrimination.  In  this 
case,  the  dimension  affected  by  changes  in  PTD  (i.e.,  time)  is  also  the  primary  dimension  of  variation 
within  the  patterns,  as  well  as  the  dimension  to  which  listeners  must  attend  to  perform  the  task. 
Listeners  were  asked  to  detect  a  change  in  the  duration  of  a  single  tone  in  a  five-tone  pattern  using  a 
modified  two-alternative  forced  choice  procedure.  Target-tone  durations  were  determined  by  the 
PTD  value  (0.1, 0.2,  or  0.4)  and  the  totd  pattern  duration  (250  msec  or  750  msec).  Context-tone 
durations  were  determined  randomly  on  each  trial.  A  single  frequency  pattern,  consisting  of  a 
sequence  of  ascending  frequencies,  was  used  throughout  the  experiment.  The  pattern  of  results 
obtained  after  several  thousand  training  trials  was  essentially  the  same  as  that  found  in  the  frequency- 
discrimination  experiments.  Increases  in  the  proportion  of  Ae  total  pattern  duration  occupied  by  the 
target  tone  consistently  resulted  in  lower  duration-discrimination  thresholds. 

We  have  also  evaluated  a  related  proposal  by  Robert  Lutfi  (1993)  which  suggests  that 
proportional  variance,  rather  than  proportional  duration,  accounts  for  the  PTD  results.  Our  analysis 
reveals  that  the  variance-based  explanation  of  the  PTD  rule  cannot  account  for  many  findings  in  the 
auditory-pattem-perception  literature  because  it  relies  on  a  decision  variable  that  preserves  little  or  no 
information  about  aspects  of  pattern  structure  that  influence  pattern  discriminability.  Although 
variance  clearly  influences  auditory  pattern  discrimination  in  many  circumstances,  the  proportional 
duration  of  a  target  tone  appears  to  affect  discriminability  because  of  the  influence  of  the  temporal 
structure  of  a  given  patten  on  attentional  focussing  within  that  pattern,  independent  of  any  effects  due 
to  component  variability  across  patterns.  (Partially  supported  by  NIH.) 


2.  Use  of  the  psychophysical  method  of  adjustment  in  tonal  pattern  discrimination. 

Watson,  Kidd,  Aimee  Surprenant,  Ward  R.  Drennan 

A  difficulty  in  tonal-pattern  research  is  that  several  thousand  trials  are  typically  required  to 
approach  asymptotic  discrimination  performance  under  minimal-uncertainty  testing  conditions.  One 
solution  to  this  problem  is  to  use  the  method  of  adjustment  to  determine  thresholds,  rather  than  a 
forced-choice  psychophysical  method.  In  this  study  the  extremely  brief  times  that  are  required  for  a 
listener  to  achieve  perceptual  isolation  for  single  components  of  a  multi-tone  patterns  using  the 
method  of  adjustment  instead  of  a  forced-choice  method  (minutes  as  opposed  to  hours)  were 
demonstrated.  A  quantitative  criterion  for  “perceptual  isolation”  is  reached  when  a  frequency  match 
is  made  that  is  as  close  to  the  standard  as  can  be  achieved  when  the  standard  and  variable  tones  are 
both  presented  in  isolation,  rather  than  in  pattern  contexts.  Not  all  adjustments  are  this  accurate, 
however.  The  most  useful  distinction  between  difficult  and  easy  adjustments  is  shown  to  be  the 
percent  of  all  the  adjustments,  for  a  given  combination  of  target  and  context  tones,  that  meet  this 
perceptual-isolation  criterion.  (Partially  supported  by  NIH.) 


1 


3.  Properties  of  the  structure  of  multi-tone  sequential  patterns  that  determine  the  difficulty  of 
perceptually  isolating  single  target  components.  Watson,  Kidd,  Aimee  Surprenant,  Ward  R. 

Drennan 

A  method  of  adjustment  was  used  to  establish  the  importance  of  each  of  several  structural 
properties  of  the  context  tones,  in  nine-tone  sequences,  in  determining  the  perceptual  isolability  of 
target  components.  Successful  “perceptual  isolation’*  of  a  target  tone  was  assumed  to  be  achieved 
when  frequency  matches  were  as  accurate  as  those  achieved  for  tones  presented  in  isolation, 
generally  meaning  matching  errors  of  less  than  1%~2%  for  the  50-ms  tones  in  these  sequences.  The 
context  property  that  was  found  to  primarily  affect  the  frequency  matches  was  the  separation,  in  Hz, 
between  the  target  tone  and  both  the  local  and  (to  a  lesser  degree)  the  remote  context  tones.  Other 
than  its  bandwidth,  the  form  of  the  local  pitch  contour  (the  target  tone  plus  the  single  tones 
immediately  before  and  after  it)  had  no  clear  effect  on  the  ability  to  “hear  out”  the  target  tone,  i.e., 
whether  the  local  context  was  ascending,  descending,  concave  up,  or  concave  down.  The  contours  of 
the  remote  context  tones  (first  and  last  three  in  the  patterns)  likewise  had  no  effect  on  performance. 
Performance  ranged  from  25%  target  tones  isolated  for  the  most  difficult  conditions  to  90%  for  the 
easiest.  (Partially  supported  by  NIH.) 

4.  The  effects  of  training  method  on  frequency  discrimination  for  individual  components  of  complex 
tonal  patterns.  Robert  F.  Port,  Catherine  L.  Rogers,  Watson,  Kidd 

It  has  been  assumed  that  subjects  trained  to  detect  increments  in  the  frequency  of  all  components 
of  complex  tonal  patterns  (broad  focus)  would  be  less  accurate  in  detecting  changes  in  a  single  target 
tone  than  subjects  who  have  been  trained  to  detect  changes  in  only  that  component  [e.g.,  Watson  et 
al.,  J.  Acoust.  Soc.  Am.  60, 1176—1186  (1976)].  In  several  experiments,  using  a  number  of  750-ms 
ten-tone  patterns,  subjects  were  trained  using  one  of  three  methods:  in  the  first  two,  a  S/2AFC 
procedure  was  used  to  train  subjects  to  detect  frequency  increments  in  a  specific  target  tone  (group 
one)  or  to  detect  frequency  increments  that  could  occur  in  any  of  the  ten  components  (group  two), 
and  in  the  third,  subjects  were  trained  only  to  identify  the  individual  patterns.  Subjects  trained  using 
these  methods  were  tested  on  their  ability  to  detect  changes  in  various  components  of  the  patterns, 
including  the  target  tone  for  the  first  group.  In  all  of  these  experiments,  only  very  slight  differences 
in  performance  were  found  among  the  different  groups.  These  results  suggest  that  lengthy  experience 
with  a  given  pattern  allows  a  listener  to  discriminate  small  differences  in  frequency  in  any  of  the 
individual  components  of  that  pattern,  relatively  independent  of  the  nature  of  that  experience. 
[Additional  support  from  ONR,  R.  Port  Principal  Investigator.] 

5.  Selective  attention  to  spectral-temporal  regions  of  auditory  patterns.  Watson,  Li,  Kidd,  Zheng. 

In  an  extension  of  the  experiment  described  above,  listeners  were  trained  to  attend  to  different 
spectral-temporal  regions  of  auditory  patterns  (ten  50-ms  tones,  300  -  3  kHz  in  frequency)  under  high 
uncertainty  (a  different  pattern  on  each  trial).  One  group  of  listeners  was  trained  to  discriminate 
changes  in  the  early  low-frequency  region  of  the  patterns,  a  second  group  in  the  late  high-frequency 
regions,  and  a  control  group  was  trained  with  changes  occurring  throughout  the  patterns.  Training 
was  conducted  for  ten  sessions,  followed  by  testing  at  all  spectral-temporal  positions.  Effects  of 
selective  training  were  much  more  substantial  than  in  the  earlier  four-pattern  experiment.  Under  the 
high  stimulus-uncertainty  conditions,  attentional  training  yields  the  predicted  results:  Discrimination 
is  relatively  improved  for  trained,  compared  to  untrained  regions.  An  unexpected  result  was  that  the 
control  group’s  performance  was  significantly  more  accurate  than  that  of  the  group  trained  to  attend 
to  early,  low-frequency  tones.  It  is  possible  that  efforts  to  selectively  attend  to  some  spectral- 
temporal  region  of  an  unfamiliar  pattern  may  reduce  overall  discrimination  performance,  compared 
to  that  achieved  when  listening  for  any  change  in  the  pattern.  (Partially  supported  by  NIH.) 


2 


B.  Spectrally  Complex  Sounds 

1.  Effects  of  spectral  and  temporal  uncertainty  on  the  detection  of  increments  in  the  level  of 
individual  tonal  components  of  "profile"  stimuli.  Watson,  Xiafeng  Li. 

In  a  modification  of  previous  profile  discrimination  experiments  (summarized  in  Profile 
Analysis ,  D.  M.  Green,  Oxford  University  Press,  1988),  intensity  increments  were  introduced,  in 
random  order,  at  one  of  ten  temporal  positions  during  the  overall  duration  of  eleven-tone  profiles, 
and  at  one  of  the  eleven  frequencies  (i.e.,  a  medium  level  of  stimulus  uncertainty).  The  tonal 
components  were  equi-log  spaced,  and  five  different  component  ranges  were  investigated. 

Differential  detectability  of  the  intensity  increments  over  the  spectral-temporal  ranges  of  the  profiles 
is  remarkably  similar  to  the  temporal  and  spectral  distribution  of  selective  attention  to  individual 
tonal  components  of  multi-tone  patterns  (Watson  and  Li,  1992;  Watson,  Kelly,  and  Wroton,  1976). 

In  contrast  to  the  tonal-pattern  experiments,  when  the  intensity-increment  task  was  repeated  under 
minimal-uncertainty  conditions,  performance  improved  very  little  compared  to  that  measured  under 
higher  uncertainty.  It  seems  likely  that  the  reason  is  that  the  large  effects  of  stimulus  uncertainty  are 
obtained  only  when  the  contextual  components  as  well  as  the  spectral-temporal  targets  are  varied 
from  trial  to  trial. 

2.  Discrimination  of  static  versus  dynamic,  and  log  versus  harmonic  profiles.  C.  Watson,  W. 
Drennan. 

"Profile"  stimuli  consisting  of  multiple  simultaneous  fixed-frequency  sinusoidal  components  are 
more  representative  of  naturally  occurring  sounds  than  the  spectrally  simpler  waveforms  more  often 
used  in  psychoacoustic  experiments.  However,  most  naturally  occurring  sounds  are  characterized  by 
dynamic  rather  than  static  spectra,  and  by  harmonically  spaced  rather  than  the  log-spaced 
components  used  in  most  profile  experiments.  In  preparing  to  study  a  variety  of  dynamic  profiles,  the 
discriminability  of  static  and  frequency-glide  profiles  was  determined,  using  both  log-  and 
harmonically  spaced  components.  Discriminations  were  based  on  the  detection  of  an  intensity 
increment  added  to  the  mid-frequency  component  of  11 -component,  400-ms  profiles.  Each  profile 
had  a  starting  frequency  range  of  200  to  2000  Hz.  Dynamic  profiles  increased  in  frequency 
continuously  over  their  400-ms  durations.  Each  subject  was  run  under  four  stimulus  conditions 
(static-log,  static-harmonic,  dynamic-log,  dynamic-harmonic)  for  a  minimum  of  2000  trials  per 
condition,  in  an  adaptive-tracking  procedure.  Mean  differences  between  asymptotic  thresholds  for  the 
stimulus  conditions  were  small  compared  to  differences  among  the  subjects.  Harmonic  spacing 
yielded  somewhat  lower  thresholds  than  did  the  log-spaced  components,  while  very  modest 
differences,  if  any,  were  found  between  static  and  dynamic  profiles. 

Because  of  the  variability  across  subjects  in  this  experiment,  a  group  of  forty-six  new  subjects 
were  selected  and  tested  for  1920  trials  on  a  static-log  profile.  The  distribution  of  thresholds  for 
these  subjects  was  skewed  to  the  right  slightly  but  was  roughly  normal  with  a  range  of  -2  to  -26  dB 
(signal  level  re  component  level).  No  evidence  was  found  to  suggest  that  profile-discrimination 
ability  is  bimodally  distributed,  as  some  have  suggested.  A  group  of  subjects  was  selected  from  each 
tail  of  the  distribution  and  tested  again  using  the  four  profiles  (static-log,  static-harmonic, 
dynamic-log,  dynamic-harmonic).  For  these  listeners,  static  profiles  yielded  slightly  better 
performance  than  the  frequency- glide  profiles,  regardless  of  subject  ability. 

5.  Determinants  of  the  perceptual  similarity  of  complex  filter  shapes.  C.  Watson,  and  Y.  Zheng. 

Sounds  were  created  by  passing  an  increasing-frequency  300-ms  sawtooth  (120-170  Hz)  through 
each  of  the  15  complex  filters.  The  filters  were  created  by  varying  two  parameters  of  a  pair  of 
overlapping  second-order  filters  (CF:  500  and  1500  Hz).  The  parameters  varied  were  the  width  of  the 
upper  filter  (Q:  1,  3,  8)  and  the  relative  amplitudes  of  the  two  filter  peaks  (+12,  +6,  0,  -6,  or  -12  dB). 


3 


Listeners  judged  the  similarity  of  pairs  of  these  sounds,  equated  for  energy,  using  a  ten-point  scale.  A 
multidimensional  scaling  (MDS)  analysis  suggested  that  when  Q  of  the  upper  filter  was  low,  the 
sounds  were  distinguished  entirely  on  the  basis  of  the  relative  amplitudes  of  the  filters.  For  high 
upper-filter  Q  values,  a  different  intensity-related  dimension  was  the  basis  for  distinguishing  among 
the  sounds,  which  were  essentially  identical  on  dimension  No.  1.  The  two  dimensions  are  associated 
with  overall  pitch,  and  with  the  salience  of  the  high-frequency  spectral  peak.  Various  physical 
models  have  been  fitted  to  the  data.  (Partially  supported  by  MH.) 


C.  Individual  Differences 

i.  Individual  differences  in  speech  and  nonspeech  processing  among  normal-hearing  subjects.  C. 
Watson,  A.  Surprenant. 

Although  a  large  portion  of  the  variance  among  listeners  in  auditory  speech  processing  is 
associated  with  the  audibility  of  components  of  the  speech  waveform,  it  is  not  possible  to  predict 
individual  differences  in  speech  perception  strictly  from  the  audiogram.  Psychoacoustic  measures  of 
spectral-temporal  acuity  with  nonspeech  stimuli  dso  have  been  shown  to  correlate  only  weakly  (or 
not  at  all)  with  speech  processing.  In  a  replication  and  extension  of  an  earlier  study  (Watson  et  al.,  J. 
Acoust.  Soc.  Am.  Suppl.  171,  S73)  100  normal-hearing  college  students  were  tested  on  speech 
perception  tasks  (nonsense  syllables,  words,  sentences  in  a  noise  background)  and  on  6 
spectral-temporal  discrimination  tasks  using  simple  and  complex  nonspeech  sounds.  Factor  analysis 
showed  that  the  abilities  that  explain  performance  on  the  nonspeech  tasks  are  quite  distinct  from 
those  that  account  for  performance  on  the  four  speech  tasks.  Performance  was  significantiy 
correlated  among  speech  tasks,  and  among  nonspeech  tasks.  Either,  (a)  auditory  spectral-temporal 
acuity  for  nonspeech  sounds  is  orthogonal  to  speech  processing  abilities,  OR  (b)  we  have  yet  to 
identify  the  appropriate  task  or  types  of  nonspeech  stimuli  that  exercise  the  abilities  required  for 
speech  recognition. 

1.  TBAC II.  A  new  version  of  the  Test  of  Basic  Auditory  Capabilities.  C.  Watson,  G.  Kidd,  B.  Gygi. 

Several  psychoacoustic  tests  have  been  prepared  and  evaluated  for  a  new  version  of  the  TBAC,  a 
test  battery  originally  developed  by  Watson  et  al.  (1982).  The  goal  of  the  new  set  of  tests  is  to 
examine  abilities  that  may  not  have  been  adequately  tested  in  the  earlier  versions  of  the  test.  Some 
evidence  suggests  that  listeners  may  employ  more  holistic  listening  strategies  when  presented  with 
spectrally  complex  sounds  that  change  over  time.  Such  sounds  have  more  in  common  with  speech 
and  many  familiar  nonspeech  sounds  than  do  the  simpler  sounds  and  tonal  sequences  included  in  the 
current  TBAC. 

The  following  tests  have  been  added  to  the  new  TBAC.  All  tests  follow  the  same  general 
procedures  as  used  with  the  current  version  of  the  TBAC.  The  first  four  tests  employ  a  modified 
two-alternative  forced  choice  paradigm  (S/2AFC),  and  the  familiar-sound  identification  test  utilizes  a 
three-alternative  forced-choice  identification  paradigm. 

1.  Detection  of  amplitude  change  over  frequency:  Measures  threshold  ripple  depth  for  subjects  to 
discriminate  between  rippled  noise  and  flat-spectrum  noise. 

2.  Detection  of  amplitude  change  over  time:  (Temporal  modulation  transfer  function.)  Measures 
modulation  detection  thresholds  at  various  carrier  frequencies. 

3.  Gap  detection:  Measures  threshold  duration  for  detection  of  a  gap  in  noise. 

4.  Gap  discrimination:  Measures  threshold  gap-duration  difference  for  detection  of  a  difference  in 
gap  durations  in  successive  bursts  of  noise. 


4 


5.  Identification  of  familiar  sounds  in  noise:  Measures  threshold  signal- to-noise  ratios  for  the 
identification  of  familiar  sounds  in  noise.  Thirty  common  sounds  were  taken  from  a  high-quality 
digital  sound-effects  library.  All  sounds  were  quite  recognizable  in  the  absence  of  noise. 

The  range  of  performance  on  these  new  tasks  is  quite  consistent  with  the  older  TBAC  tests. 
Preliminary  analyses  indicate  that  tests  1,3,  and  4  are  highly  correlated  with  most  of  the  older 
nonspeech  TBAC  tests,  while  tests  2  and  5  appear  to  be  somewhat  less  so.  None  of  the  new  tests  is 
significantly  correlated  with  the  speech  tests.  (Partially  supported  by  NIH.) 

2.  Identification  of  familiar  sounds.  C.  Watson,  G.  Kidd,  B.  Gygi. 

Most  studies  of  auditory  recognition  and  identification  have  employed  either  speech  stimuli  or 
nonspeech  sounds  generated  in  the  laboratory  (e.g.,  tones  of  various  frequencies,  tonal  patterns,  click 
trains).  The  present  study  employed  25  naturally  occurring  complex  sounds  (obtained  from  a 
commercial  sound-effects  library),  such  as  those  produced  by  doors  closing,  babies  crying, 
helicopters  in  flight,  and  other  familiar  events.  TTiese  sounds,  equated  for  peak  levels,  were  recorded 
with  a  background  of  broad-band  noise.  The  recorded  sounds  were  presented  to  groups  of  6  to  8 
listeners  in  both  open-set  and  closed-set  formats  (with  the  list  of  responses  displayed  continuously  in 
the  latter).  Confusion  matrices  were  generated  using  a  wide  range  of  event-to-noise  (Ev/N)  ratios. 
Two  frequent  confusions  were  identified  for  each  item,  and  were  used  to  create  a  three-alternative 
forced-choice  test.  Eight  values  of  Ev/N  were  selected  for  each  item  in  an  effort  to  achieve  uniform 
item  identifiability. 

We  have  found  a  wide  range  of  variation  in  identification  thresholds  and  in  the  steepness  of  the 
psychometric  functions  obtained  with  the  familiar  sounds.  The  lowest  thresholds  were  obtained  with 
sounds  that  have  a  well-defined  temporal  pattern.  It  appears  that  the  temporal  pattern  can  specify  a 
sound  source  with  a  relatively  small  amount  of  spectral  information  available.  Confusions  also 
indicated  that  listeners  tend  to  rely  on  temporal  characteristics  at  the  lower  S/N  ratios.  The  steepest 
functions  were  obtained  with  sounds  that  had  the  most  uniform  sustained  levels.  For  these  sounds, 
most  of  the  information  required  for  identification  became  audible  within  a  small  range  of  S/N  ratios, 
and  identification  performance  went  from  chance  to  near  perfect  with  very  small  increases  in  the  S/N 
ratio.  Individual  differences  were  somewhat  smaller  than  found  with  speech  perception.  The 
difference  between  average  thresholds  for  the  poorest  performing  decile  and  the  best-performing 
decile  was  approximately  4  dB,  as  compared  to  roughly  7  dB  for  speech.  Although  our  findings  are 
likely  to  have  been  affected  by  the  size  and  makeup  of  the  particular  set  of  sounds  we  have  utilized, 
the  tendencies  in  the  data  described  here  appear  to  be  ones  that  have  considerable  generality. 
(Partially  supported  by  NIH.) 

3.  Auditory  and  Visual  Speech  Perception:  Confirmation  of  a  Modality-Independent  Source  of 
Individual  Differences.  Watson,  W.  Qui,  M.  Chamberlain,  S.  Li. 

Two  experiments  were  run  to  determine  whether  individual  differences  in  auditory  speech 
recognition  abilities  are  predictable  from  those  for  speechreading  (lipreading),  using  a  total  of  90 
normal-hearing  college-student  subjects.  Tests  included  single  words  and  sentences,  recorded  on  a 
video  disk  by  a  male  actor  (Johns  Hopkins  Lipreading  Corpus,  Bernstein  and  Demorest,  1986).  The 
auditory  speech  was  presented  with  a  white  noise  masker,  at  -7  dB  Sp/N.  The  correlations  between 
overall  auditory  and  visual  performance  were  0.52  and  0.43  in  the  two  experiments,  suggesting  the 
existence  of  a  modality-independent  ability  to  perceive  linguistic  "wholes"  on  the  basis  of  linguistic 
fragments.  Subjects  in  the  second  experiment  also  identified  printed  sentences,  with  40-60% 
portions  of  the  printed  characters  deleted.  Performance  on  this  "Fragmented-Sentences  Test"  also 
correlated  significantly  with  auditory  and  visual  speech  recognition.  The  existence  of  a  modality- 
independent  source  of  variance  in  speech  recognition  abilities  may  be  a  partial  explanation  of  the 


5 


difficulty  in  demonstrating  strong  associations  between  psychoacoustic  measures  of  spectral  or 
temporal  acuity,  and  speech  discrimination' or  identification. 


D.  Noise  Discrimination 

1.  Discriminability  of  noise  samples.  M.  Rickert,  D.  Robinson. 

In  previously  reported  work  [S.  F.  Coble  and  D.  E.  Robinson,  J.  Acoust.  Soc.  Am.  92, 2630-2635 
(1992);  M.  E.  Rickert  and  D.  E.  Robinson,  J.  Acoust.  Soc.  Am.  93, 2386(A)  (1993)]  listeners 
discriminated  among  trials  consisting  of  either  two  identical  samples  of  noise  or  two  nonidentical 
samples.  Nonidentical  samples  were  generated  by  replacing  a  segment  of  noise  presented  during  the 
first  interval  with  a  new  segment.  Although  the  long-term  power  spectrum  of  the  segments  was  the 
same,  the  temporal  position  at  which  segments  were  replaced  had  a  significant  effect  on 
discriminability:  Performance  was  best  when  changes  occurred  at  the  end  and  was  poorest  when 
changes  occurred  at  the  beginning.  In  the  present  study,  the  effects  of  temporal  position  were 
measured  under  two  spectral  conditions.  In  one  condition,  noise  samples  were  identically  filtered 
(100-3000  Hz  or  455-655  Hz)  for  the  entire  stimulus  duration  (50  ms).  The  effect  of  temporal 
position  is  reduced  with  narrow-band  stimuli  but  is  not  eliminated.  In  a  second  condition,  the 
bandwidth  was  varied  within  each  sample  such  that  one  segment  was  wideband  (100-3000  Hz)  and 
the  other  narrow  band  (455-655  Hz).  Overall  performance  with  mixed  stimuli  (1)  is  similar  to  that 
with  pure  wideband  noise  when  the  uncorrelated  segment  is  wideband,  and  (2)  is  similar  to  that  with 
pure  narrow-band  noise  when  the  uncorrelated  segment  is  narrow  band. 

In  the  noise-discrimination  experiments  described  above,  center  frequency  was  held  constant  at 
545  Hz.  An  additional  experiment  has  been  conducted  in  which  both  center  frequency  (545  Hz  and 
2000  Hz)  and  bandwidth  (200  Hz  and  1000  Hz)  were  varied.  Data  were  also  obtained  with  wideband 
(100  Hz)  noise.  Although  the  effect  of  temporal  position  was  consistent  across  stimulus  conditions, 
the  size  of  the  effect  was  reduced  for  narrower  bandwidths.  There  was  no  evidence  of  an  effect  of 
center  frequency. 

2.  Leaky  integrator  models.  D.  Robinson,  M.  Rickert. 

Two  models  have  been  developed  that  describe  the  results  of  our  noise-discrimination 
experiments.  In  each  model  the  waveforms  from  the  two  temporal  intervals  are  jittered  in  amplitude, 
filtered,  and  squared  or  rectified.  In  one  model,  the  resulting  waveforms  are  passed  through  a  leaky 
integrator  and  subtracted  from  one  another.  This  difference  waveform  is  squared  and  passed  through 
a  second  leaky  integrator.  A  sample  of  the  output  of  the  second-stage  leaky  integrator  taken  at  the 
end  of  the  integration  period  is  used  as  a  decision  variable  from  which  hit  and  false  alarm  rates  are 
obtained  and  d'  is  computed.  In  the  other  model,  the  squared  waveforms  are  multiplied  by  an 
exponential  weighting  function  before  subtraction.  A  signal-to-noise  statistic  is  then  used  to  obtain 
an  estimate  of  d'.  The  fitting  parameters  for  each  model  are  the  variance  of  the  internal  noise  process 
and  the  time  constant  of  either  the  second-stage  integrator  or  of  the  exponential  weighting  function. 
The  models  provide  an  excellent  fit  to  the  data  reported  by  Coble  and  Robinson.  They  provide 
quantitative  predictions  of  the  improvement  in  performance  as  the  target  (uncorrelated)  noise 
segment  is  moved  from  the  beginning  to  the  end  of  the  burst  and  they  predict  the  constant  ratio  of  the 
duration  of  the  target  segment  to  the  total  duration. 


6 


E.  Speech  Perception 

1.  Formant  Frequency  Discrimination.  Kewley-Port  and  C.  Watson. 

Discrimination  thresholds  for  formant-frequency  discrimination,  for  FI  and  F2,  were  obtained 
for  ten  synthetic  English  vowels  (Kewley-Port,  1990;  Kewley-Port  and  Watson,  1993).  In  general, 
thresholds  values  of  AF  as  a  function  of  formant  frequency  are  best  described  as  a  piecewise-linear 
function  which  is  constant  at  14  Hz  in  the  FI  frequency  region  (<800  Hz),  and  increases  linearly  in 
the  F2  region.  In  the  F2  region,  the  resolution  for  formant  frequency  is  about  1.5%.  Minimal- 
uncertainty  thresholds  are  similar  to  the  most  accurate  discrimination  previously  reported  in  the  FI 
region,  but  about  a  factor  of  three  lower  (more  precise)  in  the  F2  region.  Thresholds  were  also 
measured  for  one  vowel,  /I/,  in  a  variety  of  consonantal  contexts,  /b,  d,  g,  z,  m,  1/  (Kewley-Port  and 
Watson,  1991).  For  FI  and  F2,  the  resulting  thresholds  were  a  factor  of  about  4-5  smaller  than  those 
reported  by  Mermelstein  (1978).  Additional  experiments  estimated  formant  frequency  thresholds 
under  medium  stimulus  uncertainty  (Kewley-Port,  1992).  While  training  required  longer  to  approach 
asymptote,  and  levels  of  performance  were  higher  for  some  CVC’s  than  for  others,  final  thresholds 
were  generally  similar  to  those  obtained  for  isolated  vowels.  Apparently,  auditory  acuity  for  formant 
frequency  discrimination  for  well-trained  subjects  is  generally  the  same  for  vowels  in  isolation  and  in 
CVC  contexts,  under  both  minimal  and  medium  levels  of  stimulus  uncertainty. 

The  results  of  these  experiments  also  suggested  a  trend  for  higher  fundamental  frequency  to 
produce  increased  formant  thresholds  for  these  synthetic  female  vowels.  Further  studies  investigated 
the  effect  of  glottal  source  on  formant  thresholds  in  synthesized  male  vowels.  Thresholds  for 
formant-frequency  discrimination  were  obtained  for  six  vowels  with  two  fundamental  frequencies: 
high  (126  Hz)  and  low  (101  Hz).  Four  well-trained  subjects  performed  an  adaptive  tracking  task 
under  minimal-uncertainty  conditions.  Discrimination  thresholds  for  vowels  with  high  FO  were 
significantly  greater  than  those  with  low  FO.  Consistent  with  results  for  female  vowels,  thresholds 
for  frequencies  below  800  Hz  in  the  FI  region  appeared  relatively  constant  while  thresholds  in  the  F2 
region  were  increasing.  There  was  a  general  trend  across  all  male  and  female  vowels  for  higher 
fundamental  frequency  to  result  in  increased  discrimination  thresholds.  (Partially  supported  by  NIH.) 


2.  The  effect  of  discriminability  on  dimensional  interactions  of  pitch  with  vowel  and  consonant 
identity.  Surprenant,  Kewley-Port  and  Watson. 

The  ability  to  ignore  irrelevant  variation  in  the  speech  signal  is  essential  for  normalizing  across 
speakers  and  situations.  Past  research  has  indicated  that  irrelevant  variation  in  such  features  as  pitch 
and  vowel  quality  caused  more  interference  on  a  consonant  classification  task  than  the  reverse. 

Along  with  other  more  memory-oriented  paradigms  such  as  serial-list  recall,  this  has  been  taken  as 
evidence  that  the  longer-lasting,  periodic  elements  of  sound  that  make  up  vowel  and  pitch 
information  remain  longer  in  auditory  memory,  thereby  interfering  with  judgments  about  consonant 
identity.  However,  in  most  of  these  demonstrations,  the  relative  discriminability  of  the  tokens  was 
not  controlled.  In  the  present  study,  discriminability  was  assessed  for  continua  of  pitch,  consonant, 
and  vowel  identity  for  stop-vowel  syllables.  Eight  well-trained  subjects  were  asked  to  make  speeded 
same-different  judgments  on  one  dimension  of  stimuli  that  varied  on  two  dimensions.  An 
interaction  for  both  response  time  and  accuracy  was  found  such  that  as  the  discriminability  of  the 
relevant  dimension  was  increased,  interference  from  the  irrelevant  dimension  decreased  and  vice 
versa.  Although  there  were  minor  variations,  the  same  general  pattern  was  observed  in  all 
conditions.  (Partially  supported  by  NIH.) 


7 


F.  Methodology 

i.  Robustness  of  psychophysical  measures.  Rickert,  Robinson. 

A  mathematical  analysis  of  several  well-known  measures  of  psychophysical  performance  has 
been  completed.  In  this  work  we  were  interested  in  the  degree  to  which  such  measures  as  d’,  P(C), 
P(C)max,  and  A’  are  "robust"  with  respect  to  violations  in  their  underlying  assumptions.  The  basic 
approach  was  to  investigate  how  each  of  these  measures  varies  with  changes  in  criterion  placement 
for  various  pairs  of  assumed  underlying  density  functions.  The  pairs  of  density  functions  investigated 
were  normal-normal,  exponential-exponential,  Chi  Square-Noncentral  Chi  Square,  and 
Rayleigh-Rice. 

Our  findings  indicate  that  measures  which  rely  on  the  assumption  of  equal-variance,  normal 
densities,  such  as  d’  and  P(C)max,  are  quite  robust,  particularly  if  extreme  values  of  the  criterion  are 
avoided,  e.g.  changes  in  the  criterion  have  small  effects  except  for  very  low  or  very  high  false  alarm 
rates.  The  so-called  non-parametric  measure.  A’,  is  not  robust,  and,  in  fact,  shows  large  changes  in 
magnitude  as  the  criterion  is  varied  on  any  of  the  density  pairs  we  investigated. 


Personnel 

Name 

Charles  S.  Watson,  Ph.D. 
Donald  E.  Robinson,  Ph.D. 
Gary  R.  Kidd,  Ph.D. 

Diane  Kewley-Port,  Ph.D. 
Aimee  M  Surprenant 
Sheldon  Li,  Ph.D. 

Ward  R.  Drennan 
Yijian  Zheng 
Brian  Gygi 
Martin  E.  Rickert 


Position  Title 

Professor 
Professor 
Associate  Scientist 
Associate  Professor 
Assistant  Professor 
Research  Associate 

Graduate  Research  Assistant 
Graduate  Research  Assistant 
Graduate  Research  Assistant 
Graduate  Research  Assistant 


Department 

Speech  and  Hearing  Sciences 
Psychology 

Speech  and  Hearing  Sciences 
Speech  and  Hearing  Sciences 
Psychology,  Purdue  University 
Speech  and  Hearing  Sciences 

Speech  and  Hearing  Sciences 
Speech  and  Hearing  Sciences 
Psychology 
Psychology 


Manuscripts  and  Publications 

Coble,  S.  F.,  &  Robinson,  D.  E.  (1992).  Discriminability  of  bursts  of  reproducible  noise.  The  Journal 
of  the  Acoustical  Society  of  America..  92, 2630-2635. 

Kidd,  G.  R.,  &  Watson,  C.  S.  (1992).  The  "proportion-of-the-total-duration  (PTD)  rule"  for  the 
discrimination  of  auditory  patterns.  Journal  of  the  Acoustical  Society  of  America.  92.  3109- 
3118 

Espinoza- Varas,  B.,  and  Watson,  C.  S.  (1994).  Effects  of  decision  criterion  on  response  latencies  of 
binary  decisions.  Perception  and  Psychophysics.  55.  190-203 

Kewley-Port,  D.  and  Watson,  C.S.  (1994)  Formant-frequency  discrimination  for  isolated  English 
vowels.  Journal  of  the  Acoustical  Society  of  America.  95, 485-496. 


8 


Kewley-Port,  D.  (1995).  Thresholds  for  formant-frequency  discrimination  of  vowels  in  consonantal 
context.  Journal  of  the  Acoustical  Society  of  America.  21,  3139-3146. 

Kidd,  G.R.  (1995)  Proportional  duration  and  proportional  variance  as  factors  in  auditory  pattern 
,  discrimination.  Journal  of  the  Acoustical  Society  of  America.  22,  1335-1338. 

Hirsch,  I.  J.,  and  Watson,  C.  S.  (1996).  Recent  research  in  psychoacoustics.  Aimual  Review  of 
Psychology. 

Kidd,  G.  R.  &  Watson,  C.  S.  (1996).  Detection  of  frequency  changes  in  transposed  sequences  of 
tones.  Journal  of  the  Acoustical  Society  of  America.  99. 553-566. 

Watson,  C.  S.,  Qui,  W.  W.,  Chamberlain,  M.,  &  Li,  X.  (1996).  Auditory  and  Visual  Speech 
Perception;  Confirmation  of  a  Modality-Independent  Source  of  Individual  Differences. 
Submitted  to  the  Journal  of  the  Acoustical  Society  of  America.  100.  1153-1162. 

Watson,  C.  S.,  &  Kidd,  G.  R.  (in  press).  The  perception  of  complex  waveforms.  In  Crocker,  M.  J. 
(Ed.)  Handbook  of  Acoustics.  New  York:  Wiley. 

Watson,  C.  S.,  &  Li,  X.  Effects  of  spectral  and  temporal  uncertainty  on  the  detection  of  increments 
in  the  level  of  individual  tonal  components  of  "profile"  stimuli,  (in  preparation  for  J.  Acoust. 
Soc.  Am.l 


Oral  Presentations 

Kidd,  G.  R.,  &  Watson,  C.  S.  (1992).  The  proportion  of  the  total-duration  (PTD)  rule  holds  for 
duration  discrimination.  J.  Acoust.  Soc.  Am..  22,  Pt.  2,  23 1 8. 

Watson,  C.  S.,  Qui,  W.  W.,  &  Chamberlain,  M.  (1992).  Correlations  between  auditory  and  visual 
speech  processing  ability:  Evidence  for  a  modality-independent  source  of  variance.  J.  Acoust. 
Soc.  Am..  92.  Pt.  2,  2385. 

Watson,  C.  S.,  &  Li,  X.  (1992).  Effects  of  spectral  and  temporal  uncertainty  on  the  detection  of 
increments  in  the  level  of  individual  tonal  components  of  "profile"  stimuli.  J.  Acoust.  Soc.  Am.. 
22,  Pt.  2, 2319. 

Kidd,  G.  R.  (1993).  Temporally  directed  attention  in  the  detection  and  discrimination  of  auditory 
pattern  components.  J.  Acoust.  Soc.  Am..  93.  Pt.  2, 2315. 

Port,  R.  F.,  Rogers,  C.  L.,  Watson,  C.  S.,  &  Kidd,  G.  R.  (1993).  The  effects  of  training  method  on 
frequency  discrimination  for  individual  components  of  complex  tonal  patterns.  J.  Acoust.  Soc. 
Am,,  22,  Pt.  2, 2315. 

Rickert,  M.  E.  and  Robinson,  D.  E.  (1993).  Stimulus-oriented  model  for  the  discrimination  of 
Gaussian  noise  samples,  J.  Acoust.  Soc.  Am.,  91,  S12  (A). 

Robinson,  D.  E.  and  Rickert,  M.  E.  (1993).  A  stimulus-oriented  model  for  discrimination  of  Gaussian 
noise  samples,  revisited  (again),  AFOSR  Conference,  Dayton,  OH. 


9 


Watson,  C.  S.,  Kidd,  G.  R.,  Surprenant,  A.,  &  Drennan,  W.  R.  (1993).  Use  of  the  psychophysical 
method  of  adjustment  in  tonal  pattern  discrimination.  J.  Acoust.  Soc.  Am..  22,  Pt.  2, 23 15. 

Watson,  C.  S.,  Kidd,  G.  R.,  Surprenant,  A.,  &  Drennan,  W.  R.  (1993).  Properties  of  the  structure  of 
multi-tone  sequential  patterns  that  determine  the  difficulty  of  perceptually  isolating  single  target 
components.  J.  Acoust.  Soc.  Am..  22,  Pt.  2, 23 1 5. 

Kewley-Port,  D.,  Li,  X.,  Zheng,  Y.  and  Beardsley,  A.  (1994).  Fundamental  frequency  effects  on 
thresholds  for  vowel  formant  discrimination.  J.  Acoust.  Soc.  Am..  95,  No.  5,  Pt.  2, 2978. 

Surprenant,  A.  and  Kewley-Port,  D.  (1994).  The  effect  of  discriminability  on  dimensional 

interactions  of  pitch  with  vowel  and  consonant  identity.  J.  Acoust.  Soc.  Am..  95,  No.  5,  Pt.  2, 
2975. 

Watson,  C.  S.,  &  Kidd,  G.  R.  (1994).  FactorsintheDesignofEffective  Auditory  Displays.  Paper 
presented  at  the  2nd  International  Conference  on  Auditory  Display,  Santa  Fe,  New  Mexico, 
November  1994. 

Watson,  C.  S.,  Li,  X.,  Kidd,  G.  R.,  &  Zheng,  Y.  (1994).  Selective  attention  to  spectral-temporal 
regions  of  auditory  patterns.  J.  Acoust.  Soc.  Am..  22,  Pt.  2,  2963. 

Surprenant,  A.  M.,  &  Watson,  C.  S.  (1995).  Individual  differences  in  speech  and  nonspeech 

processing  among  normal-hearing  subjects.  Journal  of  the  Acoustical  Society  of  America.  97.  Pt. 
2, 3275. 

Watson,  C.  S.,  &  Drennan,  W.  R.  (1995).  Discrimination  of  static  versus  dynamic,  and  log  versus 
harmonic  profiles.  Journal  of  the  Acoustical  Society  of  America.  2L  Pt.  2  3272. 

Rickert,  M.  E.,  &  Robinson,  D.  E.  (1995).  Noise  discriminability.  I.  A  comparison  of  the  effects  of 
temporal  position  and  bandwidth.  J.  Acoust.  Soc.  Am..  2&,  K.  2,  2906.. 

Robinson  &  Rickert,  M.  E.  (1995).  Noise  discriminability.  H.  Leaky  integrator  models.  J.  Acoust. 
Soc,  Am..  2S,  Pt.  2, 2906. 

Watson,  C.  S.,  &  Zheng,  Y.  (1995).  Determinants  of  the  perceptual  similarity  of  complex  filter 
shapes.  J.  Acoust.  Soc.  Am..  Pt.  2,  2927. 

Rickert,  M.  E.,  &  Robinson,  D.  E.  (1996).  The  effects  of  center  frequency  and  bandwidth  on  the 
discriminability  of  noise.  J.  Acoust.  Soc.  Am..  99.  Pt.  2.  2543. 

Drennan,  W.  R.,  &  Watson,  C.  S.  (1996).  Discrimination  of  harmonic  and  log  spaced  profiles  and 
static  and  dynamic  profiles  for  both  good  and  poor  profile  listeners.  J.  Acoust.  Soc.  Am..  99.  Pt. 
2,  2565. 

Watson,  C.  S.,  Kidd,  G.  R.,  &  Gygi,  B.  (1996).  Detection  and  identification  of  environmental 
sounds.  J.  Acoust.  Soc.  Am..  22,  Pt.  2,  2516. 


10 


